Gene.bordegaray/2026/02/dyn filter partition indexed#20246
Gene.bordegaray/2026/02/dyn filter partition indexed#20246gene-bordegaray wants to merge 7 commits intoapache:mainfrom
Conversation
NGA-TRAN
left a comment
There was a problem hiding this comment.
The PR makes sense. My main comments:
- You may add document about pruning and filtering and why we need partition-index if not yet documented somewhere
- I think there is a typo in test data
- I am unclear whether recursively searching RepartitionExec is the best strategy and whether it will introduce new bug. Maybe some examples and explanation will help.
- I would ask someone that know dynamic filtering well to review this. Adrian and Lia? Maybe they will help explain the recursive walk, too
…02/dyn_filter_partition_indexed
LiaCastaneda
left a comment
There was a problem hiding this comment.
This makes sense to me and will be very helpful for use cases where we want to avoid repartitioning data. My only concern is that API users would need to align the probe and build side partitions, but this seems like a reasonable tradeoff. Let’s see what other contributors think.
| /// CASE-hash routing, but this assumes build/probe partition indices stay aligned for | ||
| /// dynamic filter consumers. |
There was a problem hiding this comment.
| /// CASE-hash routing, but this assumes build/probe partition indices stay aligned for | |
| /// dynamic filter consumers. | |
| /// CASE-hash routing, but this assumes build/probe partition indices stay aligned, otherwise the query might have correctness problems. |
I would also add a small example to clarify what “aligned” means -- for example, ranges 0–5 on partition 0 on both the build and probe sides, and so on.
There was a problem hiding this comment.
Agreed. Example will epxlain things clearer here. Monodraw is a good tool for this
| plan = if let Some(hash_join) = plan.as_any().downcast_ref::<HashJoinExec>() | ||
| && matches!(hash_join.mode, PartitionMode::Partitioned) | ||
| { | ||
| let routing_mode = |
There was a problem hiding this comment.
For a Partitioned join, is it possible to have one side perserving partitioning and not the other (have RepartitionExec on one side only)? This would be a misuse from the API? if so, should we throw an error?
NGA-TRAN
left a comment
There was a problem hiding this comment.
Definitely the right approach. I let Lia and DF committers review the details. Great work, Gene
| /// CASE-hash routing, but this assumes build/probe partition indices stay aligned for | ||
| /// dynamic filter consumers. |
There was a problem hiding this comment.
Agreed. Example will epxlain things clearer here. Monodraw is a good tool for this
| && dynamic.has_partitioned_filters() | ||
| { | ||
| let snapshot = dynamic.current_for_partition(partition)?; | ||
| return Ok(Transformed::yes(snapshot)); |
There was a problem hiding this comment.
If there are 2 joins in the plan like this and dynamic filtering is turned on for both of them but they will be on different partitioning expressions. Will this transform stop and get the right expression? Adding some tests for those cases will help verify that and uncover bugs if any
Join2
/ \
repartition T3
/
Join2
/ \
T1 T2
There was a problem hiding this comment.
For this test, you can use SQLlogic to test it.
There was a problem hiding this comment.
Ok, I think a sql logic and a integration test will be good. I want to check the filter itself and the mode that got selected in this case
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?